254 research outputs found

    Optimistic chordal coloring: a coalescing heuristic forSSAform programs

    Get PDF
    The interference graph for a procedure in Static Single Assignment (SSA) Form is chordal. Since the k-colorability problem can be solved in polynomial-time for chordal graphs, this result has generated interest in SSA-based heuristics for spilling and coalescing. Since copies can be folded during SSA construction, instances of the coalescing problem under SSA have fewer affinities than traditional methods. This paper presents Optimistic Chordal Coloring (OCC), a coalescing heuristic for chordal graphs. OCC was evaluated on interference graphs from embedded/multimedia benchmarks: in all cases, OCC found the optimal solution, and ran, on average, 2.30Ă— faster than Iterated Register Coalescin

    Optimized Memory Access For Dynamically Scheduled High Level Synthesis

    Get PDF
    Dynamically-scheduled elastic circuits generated by High-Level Synthesis (HLS) tools are inherently out-of-order, following the flow of data rather than the evolution of an instruction pointer. Components of the circuit which access memory need to be connected to a Load-Store Queue (LSQ) that dynamically checks for memory dependencies, performs store ordering and forwarding, and allows unordered access to Random-Access Memory (RAM) whenever possible. While connecting every memory access (load/store) component to an LSQ ensures correctness of program execution, the hardware and power cost makes this solution unattractive. Statically ruling out dependencies allows circuits to access memory via lightweight components that use an arbitrator to handle RAM port sharing. Reducing the number of components using the LSQ allows the compiler to generate smaller queues which results in superlinear savings in hardware and power for the memory subsystem. This work describes additions to the Elastic Compiler (EC) that allow it to analyze algorithms expressed in LLVM-IR, an intermediate code representation, to rule out memory dependencies between load/store instructions and their underlying insights. These analyses leverage pointer analysis as well as array access patterns to narrow down the list of possibly dependent instructions. We also enhance the compiler to leverage our analyses and automatically generate relevant memory-access components for the circuit and to connect them to the relevant arbitrator or LSQ

    Author Guidelines for MSS Symposium Proceedings

    Get PDF
    The abstract is to be in fully-justified italicized text, at the top of the left-hand column, below the author and affiliation information. Use the word “Abstract ” as the title, in 12-point Times, boldface type, centered relative to the column, initially capitalized. The abstract is to be in 10-point, single-spaced type. The abstract may be up to 3 inches (7.62 cm) long. Leave two blank lines after the Abstract, then begin the main text. 1

    Virtualized execution runtime for FPGA accelerators in the cloud

    Get PDF
    FPGAs offer high performance coupled with energy efficiency, making them extremely attractive computational resources within a cloud ecosystem. However, to achieve this integration and make them easy to program, we first need to enable users with varying expertise to easily develop cloud applications that leverage FPGAs. With the growing size of FPGAs, allocating them monolithically to users can be wasteful due to potentially low device utilization. Hence, we also need to be able to dynamically share FPGAs among multiple users. To address these concerns, we propose a methodology and a runtime system that together simplify the FPGA application development process by providing: 1) a clean abstraction with high-level APIs for easy application development; 2) a simple execution model that supports both hardware and software execution; and 3) a shared memory-model which is convenient to use for the programmers. Akin to an operating system on a computer, our lightweight runtime system enables the simultaneous execution of multiple applications by virtualizing computational resources, i.e., FPGA resources and on-board memory, and offers protection facilities to isolate applications from each other. In this paper, we illustrate how these features can be developed in a lightweight manner and quantitatively evaluate the performance overhead they introduce on a small set of applications running on our proof of concept prototype. Our results demonstrate that these features only introduce marginal performance overheads. More importantly, by sharing resources for simultaneous execution of multiple user applications, our platform improves FPGA utilization and delivers higher aggregate throughput compared to accessing the device in a time-shared manner

    A Predictable Communication Scheme for Embedded Multiprocessor Systems

    Get PDF
    Networks-on-Chip are emerging as a widely accepted alternative for the traditional bus architectures. However, their applicability by the system designers is far away from being intuitive due to their lack of predictability. This communication predictability can be obtained statically or dynamically. A dynamic allocation is more suitable for flexible multiprocessor systems and requires the implementation of a Quality-of-Service (QoS) mechanism. This paper explores the main QoS schemes suitable for such systems: connection-oriented and connectionless. The simulation results show that the connectionless scheme provides a better predictability in terms of message latency with an acceptable buffer requirement. This work provides the designer with valuable guidelines to choose a priori the QoS parameters such that they can be confident on the predicted results
    • …
    corecore